Overview

Dataset statistics

Number of variables10
Number of observations25751
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.1 MiB
Average record size in memory125.0 B

Variable types

Numeric8
Categorical2

Alerts

alert_key is highly correlated with dateHigh correlation
date is highly correlated with alert_keyHigh correlation
tx_time is highly correlated with amtHigh correlation
amt is highly correlated with tx_timeHigh correlation
amt is highly skewed (γ1 = 82.57466783) Skewed
alert_key has unique values Unique
total_asset has 3120 (12.1%) zeros Zeros
tx_time has 8142 (31.6%) zeros Zeros
amt has 6043 (23.5%) zeros Zeros

Reproduction

Analysis started2022-12-17 15:11:37.165880
Analysis finished2022-12-17 15:12:01.738154
Duration24.57 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

alert_key
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct25751
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean265685.6269
Minimum171142
Maximum365073
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.4 KiB
2022-12-17T23:12:01.854831image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum171142
5-th percentile176967.5
Q1212536
median266346
Q3316658.5
95-th percentile356314.5
Maximum365073
Range193931
Interquartile range (IQR)104122.5

Descriptive statistics

Standard deviation58623.84087
Coefficient of variation (CV)0.2206511566
Kurtosis-1.273665901
Mean265685.6269
Median Absolute Deviation (MAD)51941
Skewness-0.008439744393
Sum6841670579
Variance3436754718
MonotonicityNot monotonic
2022-12-17T23:12:02.164417image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3522491
 
< 0.1%
2883851
 
< 0.1%
2884171
 
< 0.1%
2884131
 
< 0.1%
2884041
 
< 0.1%
2884031
 
< 0.1%
2884011
 
< 0.1%
2883991
 
< 0.1%
2883971
 
< 0.1%
2883951
 
< 0.1%
Other values (25741)25741
> 99.9%
ValueCountFrequency (%)
1711421
< 0.1%
1711521
< 0.1%
1711771
< 0.1%
1711781
< 0.1%
1711801
< 0.1%
1711811
< 0.1%
1711891
< 0.1%
1711921
< 0.1%
1711971
< 0.1%
1712001
< 0.1%
ValueCountFrequency (%)
3650731
< 0.1%
3650091
< 0.1%
3650081
< 0.1%
3650041
< 0.1%
3650011
< 0.1%
3649991
< 0.1%
3649961
< 0.1%
3649951
< 0.1%
3649941
< 0.1%
3649931
< 0.1%

cust_id
Real number (ℝ≥0)

Distinct7708
Distinct (%)29.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3694.211137
Minimum0
Maximum7707
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size301.8 KiB
2022-12-17T23:12:02.370525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile268.5
Q11761
median3685
Q35489
95-th percentile7291
Maximum7707
Range7707
Interquartile range (IQR)3728

Descriptive statistics

Standard deviation2226.914988
Coefficient of variation (CV)0.6028120498
Kurtosis-1.144245677
Mean3694.211137
Median Absolute Deviation (MAD)1838
Skewness0.05270535689
Sum95129631
Variance4959150.364
MonotonicityNot monotonic
2022-12-17T23:12:02.550796image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120246
 
1.0%
540164
 
0.6%
255158
 
0.6%
2779150
 
0.6%
7637142
 
0.6%
1179141
 
0.5%
3782131
 
0.5%
1898118
 
0.5%
272118
 
0.5%
1057117
 
0.5%
Other values (7698)24266
94.2%
ValueCountFrequency (%)
01
 
< 0.1%
11
 
< 0.1%
23
 
< 0.1%
34
 
< 0.1%
46
 
< 0.1%
543
0.2%
61
 
< 0.1%
71
 
< 0.1%
812
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
77071
 
< 0.1%
77063
 
< 0.1%
77051
 
< 0.1%
770430
0.1%
77031
 
< 0.1%
77021
 
< 0.1%
77011
 
< 0.1%
77001
 
< 0.1%
76991
 
< 0.1%
76981
 
< 0.1%

risk_rank
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size402.4 KiB
1
17348 
3
7448 
2
 
891
0
 
64

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters25751
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row3
5th row1

Common Values

ValueCountFrequency (%)
117348
67.4%
37448
28.9%
2891
 
3.5%
064
 
0.2%

Length

2022-12-17T23:12:02.776641image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-12-17T23:12:02.939340image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
117348
67.4%
37448
28.9%
2891
 
3.5%
064
 
0.2%

Most occurring characters

ValueCountFrequency (%)
117348
67.4%
37448
28.9%
2891
 
3.5%
064
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number25751
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
117348
67.4%
37448
28.9%
2891
 
3.5%
064
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common25751
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
117348
67.4%
37448
28.9%
2891
 
3.5%
064
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII25751
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
117348
67.4%
37448
28.9%
2891
 
3.5%
064
 
0.2%

occupation_code
Real number (ℝ≥0)

Distinct21
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.18686653
Minimum0
Maximum20
Zeros118
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size402.4 KiB
2022-12-17T23:12:03.083717image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q112
median15
Q319
95-th percentile19
Maximum20
Range20
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.775775434
Coefficient of variation (CV)0.3366335635
Kurtosis-0.08229778166
Mean14.18686653
Median Absolute Deviation (MAD)3
Skewness-0.8265953145
Sum365326
Variance22.808031
MonotonicityNot monotonic
2022-12-17T23:12:03.235008image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
196305
24.5%
125047
19.6%
174110
16.0%
92464
 
9.6%
181236
 
4.8%
151006
 
3.9%
13979
 
3.8%
5946
 
3.7%
20704
 
2.7%
14635
 
2.5%
Other values (11)2319
 
9.0%
ValueCountFrequency (%)
0118
 
0.5%
1220
 
0.9%
2130
 
0.5%
3314
 
1.2%
4555
 
2.2%
5946
 
3.7%
61
 
< 0.1%
7103
 
0.4%
8135
 
0.5%
92464
9.6%
ValueCountFrequency (%)
20704
 
2.7%
196305
24.5%
181236
 
4.8%
174110
16.0%
16321
 
1.2%
151006
 
3.9%
14635
 
2.5%
13979
 
3.8%
125047
19.6%
11351
 
1.4%

total_asset
Real number (ℝ≥0)

ZEROS

Distinct8073
Distinct (%)31.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean713742.6661
Minimum0
Maximum73863211
Zeros3120
Zeros (%)12.1%
Negative0
Negative (%)0.0%
Memory size402.4 KiB
2022-12-17T23:12:03.411152image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q17508
median128880
Q3597231.5
95-th percentile2717416
Maximum73863211
Range73863211
Interquartile range (IQR)589723.5

Descriptive statistics

Standard deviation2435460.555
Coefficient of variation (CV)3.412238991
Kurtosis231.2987476
Mean713742.6661
Median Absolute Deviation (MAD)128880
Skewness12.73843401
Sum1.83795874 × 1010
Variance5.931468114 × 1012
MonotonicityNot monotonic
2022-12-17T23:12:03.573730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03120
 
12.1%
101145
 
0.6%
30383
 
0.3%
71579
 
0.3%
10476
 
0.3%
884873
 
0.3%
10251
 
0.2%
10340
 
0.2%
1839032
 
0.1%
20130
 
0.1%
Other values (8063)22022
85.5%
ValueCountFrequency (%)
03120
12.1%
65
 
< 0.1%
71
 
< 0.1%
145
 
< 0.1%
166
 
< 0.1%
221
 
< 0.1%
242
 
< 0.1%
261
 
< 0.1%
303
 
< 0.1%
311
 
< 0.1%
ValueCountFrequency (%)
738632111
 
< 0.1%
549678071
 
< 0.1%
5449722211
< 0.1%
543522771
 
< 0.1%
478695765
< 0.1%
472231611
 
< 0.1%
431697991
 
< 0.1%
382025041
 
< 0.1%
376410134
 
< 0.1%
358292935
< 0.1%

AGE
Real number (ℝ≥0)

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.63302396
Minimum0
Maximum10
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size402.4 KiB
2022-12-17T23:12:03.723372image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q13
median3
Q34
95-th percentile6
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.309948141
Coefficient of variation (CV)0.3605668874
Kurtosis0.6346197153
Mean3.63302396
Median Absolute Deviation (MAD)1
Skewness0.7775862555
Sum93554
Variance1.715964133
MonotonicityNot monotonic
2022-12-17T23:12:03.854661image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
38165
31.7%
46498
25.2%
25088
19.8%
53531
13.7%
61725
 
6.7%
7479
 
1.9%
193
 
0.4%
888
 
0.3%
974
 
0.3%
107
 
< 0.1%
ValueCountFrequency (%)
03
 
< 0.1%
193
 
0.4%
25088
19.8%
38165
31.7%
46498
25.2%
53531
13.7%
61725
 
6.7%
7479
 
1.9%
888
 
0.3%
974
 
0.3%
ValueCountFrequency (%)
107
 
< 0.1%
974
 
0.3%
888
 
0.3%
7479
 
1.9%
61725
 
6.7%
53531
13.7%
46498
25.2%
38165
31.7%
25088
19.8%
193
 
0.4%

date
Real number (ℝ≥0)

HIGH CORRELATION

Distinct262
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean198.1640325
Minimum0
Maximum393
Zeros88
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size402.4 KiB
2022-12-17T23:12:04.011525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11
Q192
median210
Q3295
95-th percentile376
Maximum393
Range393
Interquartile range (IQR)203

Descriptive statistics

Standard deviation118.263229
Coefficient of variation (CV)0.596794623
Kurtosis-1.242200673
Mean198.1640325
Median Absolute Deviation (MAD)102
Skewness-0.1382121659
Sum5102922
Variance13986.19134
MonotonicityNot monotonic
2022-12-17T23:12:04.194213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
111010
 
3.9%
13410
 
1.6%
272255
 
1.0%
273239
 
0.9%
277215
 
0.8%
280203
 
0.8%
377186
 
0.7%
312182
 
0.7%
6177
 
0.7%
258161
 
0.6%
Other values (252)22713
88.2%
ValueCountFrequency (%)
088
 
0.3%
5152
 
0.6%
6177
 
0.7%
783
 
0.3%
886
 
0.3%
111010
3.9%
1284
 
0.3%
13410
1.6%
1494
 
0.4%
1567
 
0.3%
ValueCountFrequency (%)
393102
0.4%
39279
0.3%
39176
0.3%
39077
0.3%
38999
0.4%
38689
0.3%
38566
0.3%
38466
0.3%
38390
0.3%
382104
0.4%

tx_time
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct257
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.896081706
Minimum0
Maximum479
Zeros8142
Zeros (%)31.6%
Negative0
Negative (%)0.0%
Memory size402.4 KiB
2022-12-17T23:12:04.389533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median3
Q36
95-th percentile17
Maximum479
Range479
Interquartile range (IQR)6

Descriptive statistics

Standard deviation27.30572394
Coefficient of variation (CV)3.959599828
Kurtosis136.9119409
Mean6.896081706
Median Absolute Deviation (MAD)3
Skewness11.00664706
Sum177581
Variance745.6025598
MonotonicityNot monotonic
2022-12-17T23:12:04.575706image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
08142
31.6%
12395
 
9.3%
22279
 
8.9%
52003
 
7.8%
31918
 
7.4%
41759
 
6.8%
61570
 
6.1%
71097
 
4.3%
8864
 
3.4%
9582
 
2.3%
Other values (247)3142
 
12.2%
ValueCountFrequency (%)
08142
31.6%
12395
 
9.3%
22279
 
8.9%
31918
 
7.4%
41759
 
6.8%
52003
 
7.8%
61570
 
6.1%
71097
 
4.3%
8864
 
3.4%
9582
 
2.3%
ValueCountFrequency (%)
4791
< 0.1%
4501
< 0.1%
4461
< 0.1%
4431
< 0.1%
4392
< 0.1%
4381
< 0.1%
4311
< 0.1%
4301
< 0.1%
4251
< 0.1%
4241
< 0.1%

amt
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct18497
Distinct (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5770835.038
Minimum-88663278.2
Maximum9364465262
Zeros6043
Zeros (%)23.5%
Negative2
Negative (%)< 0.1%
Memory size402.4 KiB
2022-12-17T23:12:04.752151image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-88663278.2
5-th percentile0
Q1173.5
median66926.02
Q3778552.6175
95-th percentile18359600.9
Maximum9364465262
Range9453128541
Interquartile range (IQR)778379.1175

Descriptive statistics

Standard deviation92210705.88
Coefficient of variation (CV)15.97874576
Kurtosis8073.687761
Mean5770835.038
Median Absolute Deviation (MAD)66926.02
Skewness82.57466783
Sum1.486047731 × 1011
Variance8.50281428 × 1015
MonotonicityNot monotonic
2022-12-17T23:12:04.949355image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06043
 
23.5%
426
 
0.1%
815
 
0.1%
103113
 
0.1%
2011
 
< 0.1%
1210
 
< 0.1%
19
 
< 0.1%
178
 
< 0.1%
10438
 
< 0.1%
5167
 
< 0.1%
Other values (18487)19601
76.1%
ValueCountFrequency (%)
-88663278.21
 
< 0.1%
-9974411
 
< 0.1%
06043
23.5%
19
 
< 0.1%
24
 
< 0.1%
37
 
< 0.1%
426
 
0.1%
57
 
< 0.1%
64
 
< 0.1%
74
 
< 0.1%
ValueCountFrequency (%)
93644652621
< 0.1%
92337175211
< 0.1%
24403079001
< 0.1%
23811784631
< 0.1%
15693233721
< 0.1%
15661064091
< 0.1%
15211215941
< 0.1%
14986210801
< 0.1%
14457194271
< 0.1%
13814361111
< 0.1%

sar_flag
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size402.4 KiB
0.0
25517 
1.0
 
234

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters77253
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.025517
99.1%
1.0234
 
0.9%

Length

2022-12-17T23:12:05.281466image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-12-17T23:12:05.432426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.025517
99.1%
1.0234
 
0.9%

Most occurring characters

ValueCountFrequency (%)
051268
66.4%
.25751
33.3%
1234
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number51502
66.7%
Other Punctuation25751
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
051268
99.5%
1234
 
0.5%
Other Punctuation
ValueCountFrequency (%)
.25751
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common77253
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
051268
66.4%
.25751
33.3%
1234
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII77253
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
051268
66.4%
.25751
33.3%
1234
 
0.3%

Interactions

2022-12-17T23:12:00.016307image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:49.110260image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:51.194124image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:53.722737image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:55.046531image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.325900image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:57.565239image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:58.853477image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:00.166660image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:49.423318image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:51.526443image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:53.915237image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:55.205739image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.488215image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:57.725461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.010547image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:00.314273image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:49.674784image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:52.132072image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:54.091562image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:55.381419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.638303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:57.892912image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.169445image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:00.463222image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:49.921937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:52.477841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:54.276581image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:55.550214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.787312image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:58.064243image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.325792image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:00.607382image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:50.160305image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:52.797199image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:54.448099image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:55.715935image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.927717image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:58.227164image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.476195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:00.736982image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:50.442272image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:53.017529image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:54.596410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:55.872208image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:57.053947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:58.380177image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.608143image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:00.877713image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:50.758413image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:53.285548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:54.757128image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.037118image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:57.200460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:58.556159image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.754119image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:12:01.012022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:50.988002image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:53.533544image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:54.902553image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:56.182921image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:57.438318image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:58.708533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-12-17T23:11:59.885334image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-12-17T23:12:05.564022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-12-17T23:12:05.818171image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-17T23:12:06.083553image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-17T23:12:06.339631image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-17T23:12:06.579227image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-17T23:12:06.743246image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-17T23:12:01.242504image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-17T23:12:01.560223image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

alert_keycust_idrisk_rankoccupation_codetotal_assetAGEdatetx_timeamtsar_flag
03522493912119.01465816.0736524.01.286508e+070.0
1352253539312.098177.023655.02.054065e+070.0
23522546924119.02052922.0736513.01.394017e+060.0
33522803431315.0201906.0536513.06.514627e+060.0
435228284112.07450.053651.01.047400e+040.0
53522913932117.0182242.053653.02.320450e+050.0
63522987637312.02422.043654.03.339420e+050.0
73523011317119.02536600.043651.02.647000e+030.0
83523024769117.0173255.043654.08.643975e+050.0
93523054063312.0876408.0336513.01.560263e+070.0

Last rows

alert_keycust_idrisk_rankoccupation_codetotal_assetAGEdatetx_timeamtsar_flag
257413521113066117.0829412.0636410.0169170.0000.0
25742352114599712.0444392.033646.0829302.0450.0
257433521185306115.0242106.043645.050632.1850.0
257443521197209113.040907.023641.013719.0000.0
257453521203910117.0114439.023646.04251483.8150.0
257463521235500117.012207.023641.0176612.0000.0
25747352124188117.0259985.043641.049784.0000.0
25748352125336319.0928963.033641.06900402.0000.0
257493521287704319.021647.043643.015680.0000.0
257503521325923119.03218731.033642.02837081.9160.0